88 research outputs found
Recommended from our members
Effects of incorrect computer-aided detection (CAD) output on human decision-making in mammography
To investigate the effects of incorrect computer output on the reliability of the decisions of human users. This work followed an independent UK clinical trial that evaluated the impact of computer-aided detection(CAD) in breast screening. The aim was to use data from this trial to feed into probabilistic models (similar to those used in "reliability engineering") which would detect and assess possible ways of improving the human-CAD interaction. Some analyses required extra data; therefore, two supplementary studies were conducted. Study 1 was designed to elucidate the effects of computer failure on human performance. Study 2 was conducted to clarify unexpected findings from Study 1
Conservative Confidence Bounds in Safety, from Generalised Claims of Improvement & Statistical Evidence
“Proven-in-use”, “globally-at-least-equivalent”, “stress-tested”, are concepts that come up in diverse contexts in acceptance, certification or licensing of critical systems. Their common feature is that dependability claims for a system in a certain operational environment are supported, in part, by evidence – viz of successful operation – concerning different, though related, system[s] and/or environment[s], together with an auxiliary argument that the target system/environment offers the same, or improved, safety. We propose a formal probabilistic (Bayesian) organisation for these arguments. Through specific examples of evidence for the “improvement” argument above, we demonstrate scenarios in which formalising such arguments substantially increases confidence in the target system, and show why this is not always the case. Example scenarios concern vehicles and nuclear plants. Besides supporting stronger claims, the mathematical formalisation imposes precise statements of the bases for “improvement” claims: seemingly similar forms of prior beliefs are sometimes revealed to imply substantial differences in the claims they can support
Validation of Ultrahigh Dependability for Software-Based Systems
Modern society depends on computers for a number of critical tasks in which failure can have very high costs. As a consequence, high levels of dependability (reliability, safety, etc.) are required from such computers, including their software. Whenever a quantitative approach to risk is adopted, these requirements must be stated in quantitative terms, and a rigorous demonstration of their being attained is necessary. For software used in the most critical roles, such demonstrations are not usually supplied. The fact is that the dependability requirements often lie near the limit of the current state of the art, or beyond, in terms not only of the ability to satisfy them, but also, and more often, of the ability to demonstrate that they are satisfied in the individual operational products (validation). We discuss reasons why such demonstrations cannot usually be provided with the means available: reliability growth models, testing with stable reliability, structural dependability modelling, as well as more informal arguments based on good engineering practice. We state some rigorous arguments about the limits of what can be validated with each of such means. Combining evidence from these different sources would seem to raise the levels that can be validated; yet this improvement is not such as to solve the problem. It appears that engineering practice must take into account the fact that no solution exists, at present, for the validation of ultra-high dependability in systems relying on complex software
Recommended from our members
Modeling software design diversity
Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. Whilst there is clear evidence that the approach can be expected to deliver some increase in reliability compared with a single version, there is not agreement about the extent of this. More importantly, it remains difficult to evaluate exactly how reliable a particular diverse fault-tolerant system is. This difficulty arises because assumptions of independence of failures between different versions have been shown not to be tenable: assessment of the actual level of dependence present is therefore needed, and this is hard. In this tutorial we survey the modelling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault tolerant systems. The intended audience is one of designers, assessors and project managers with only a basic knowledge of probabilities, as well as reliability experts without detailed knowledge of software, who seek an introduction to the probabilistic issues in decisions about design diversity
Recommended from our members
A conservative bound for the probability of failure of a 1-out-of-2 protection system with one hardware-only and one software-based protection train
Redundancy and diversity have long been used as means to obtain high reliability in critical systems. While it is easy to show that, say, a 1-out-of-2 diverse system will be more reliable than each of its two individual “trains”, assessing the actual reliability of such systems can be difficult because the trains cannot be assumed to fail independently. If we cannot claim independence of train failures, the computation of system reliability is difficult, because we would need to know the probability of failure on demand (pfd) for every possible demand. These are unlikely to be known in the case of software. Claims for software often concern its marginalpfd, i.e. average across all possible demands. In this paper we consider the case of a 1-out-of-2 safety protection system in which one train contains software (and hardware), and the other train contains only hardware equipment. We show that a useful upper (i.e. conservative) bound can be obtained for the system pfd using only the unconditional pfd for software together with information about the variation of hardware failure probability across demands, which is likely to be known or estimatable. The worst-case result is obtained by “allocating” software failure probability among demand “classes” so as to maximize system pfd
Recommended from our members
CAD in mammography: lesion-level versus case-level analysis of the effects of prompts on human decisions
Object: To understand decision processes in CAD-supported breast screening by analysing how prompts affect readers’ judgements of individual mammographic features (lesions). To this end we analysed hitherto unexamined details of reports completed by mammogram readers in an earlier evaluation of a CAD tool.
Material and methods: Assessments of lesions were extracted from 5,839 reports for 59 cancer cases. Statistical analyses of these data focused on what features readers considered when recalling a cancer case and how readers reacted to CAD prompts.
Results: About 13.5% of recall decisions were found to be caused by responses to features other than those indicating actual cancer. Effects of CAD: lesions were more likely to be examined if prompted; the presence of a prompt on a cancer increased the probability of both detection and recall especially for less accurate readers in subtler cases; lack of prompts made cancer features less likely to be detected; false prompts made non-cancer features more likely to be classified as cancer.
Conclusion: The apparent lack of impact reported for CAD in some studies is plausibly due to CAD systematically affecting readers’ identification of individual features, in a beneficial way for certain combinations of readers and features and a damaging way for others. Mammogram readers do not ignore prompts. Methodologically, assessing CAD by numbers of recalled cancer cases may be misleading
Recommended from our members
Modeling the probability of failure on demand (pfd) of a 1-out-of-2 system in which one channel is “quasi-perfect”
Our earlier work proposed ways of overcoming some of the difficulties of lack of independence in reliability modeling of 1-out-of-2 software-based systems. Firstly, it is well known that aleatory independence between the failures of two channels A and B cannot be assumed, so system pfd is not a simple product of channel pfds. However, it has been shown that the probability of system failure can be bounded conservatively by a simple product of pfdA and pnpB (probability not perfect) in those special cases where channel B is sufficiently simple to be possibly perfect. Whilst this “solves” the problem of aleatory dependence, the issue of epistemic dependence remains: An assessor’s beliefs about unknown pfdA and pnpB will not have them independent. Recent work has partially overcome this problem by requiring only marginal beliefs – at the price of further conservatism. Here we generalize these results. Instead of “perfection” we introduce the notion of “quasi-perfection”: a small pfd practically equivalent to perfection (e.g. yielding very small chance of failure in the entire life of a fleet of systems). We present a conservative argument supporting claims about system pfd. We propose further work, e.g. to conduct “what if?” calculations to understand exactly how conservative our approach might be in practice, and suggest further simplifications
Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles
Context: Demonstrating high reliability and safety for safety-critical systems (SCSs) remains a hard problem. Diverse evidence needs to be combined in a rigorous way: in particular, results of operational testing with other evidence from design and verification. Growing use of machine learning in SCSs, by precluding most established methods for gaining assurance, makes evidence from operational testing even more important for supporting safety and reliability claims.
Objective: We revisit the problem of using operational testing to demonstrate high reliability. We use Autonomous Vehicles (AVs) as a current example. AVs are making their debut on public roads: methods for assessing whether an AV is safe enough are urgently needed. We demonstrate how to answer 5 questions that would arise in assessing an AV type, starting with those proposed by a highly-cited study.
Method: We apply new theorems extending our Conservative Bayesian Inference (CBI) approach, which exploit the rigour of Bayesian methods while reducing the risk of involuntary misuse associated (we argue) with now-common applications of Bayesian inference; we define additional conditions needed for applying these methods to AVs.
Results: Prior knowledge can bring substantial advantages if the AV design allows strong expectations of safety before road testing. We also show how naive attempts at conservative assessment may lead to over-optimism instead; why extrapolating the trend of disengagements (take-overs by human drivers) is not suitable for safety claims; use of knowledge that an AV has moved to a “less stressful” environment.
Conclusion: While some reliability targets will remain too high to be practically verifiable, our CBI approach removes a major source of doubt: it allows use of prior knowledge without inducing dangerously optimistic biases. For certain ranges of required reliability and prior beliefs, CBI thus supports feasible, sound arguments. Useful conservative claims can be derived from limited prior knowledge
- …